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Abstract 



We develop a model of issue-specific voting behavior. This model can be used 
to explore lawmakers' personal voting patterns of voting by issue area, providing an 
exploratory window into how the language of the law is correlated with political sup- 
port. We derive approximate posterior inference algorithms based on variational meth- 
ods. Across 12 years of legislative data, we demonstrate both improvement in heldout 
prediction performance and the model's utility in interpreting an inherently multi- 
dimensional space. 

Key words: Item response theory, Probabilistic topic model, Variational inference, Legislative 
voting 



1. INTRODUCTION 



Legislative behavior centers around the votes made by lawmakers. These votes are captured 
in roll call data, a matrix with lawmakers in the rows and proposed legislation in the columns. 
We illustrate a sample of roll call votes for the United States Senate in Figure [TJ 

The seminal work of Poole and Rosenthal ( 1985 ) introduced the ideal point model, using 
roll call data to infer the latent political positions of the lawmakers. The ideal point model 



is a latent factor model of binary data and an application of item-response theory (Lord 



19801) to roll call data. It gives each lawmaker a latent political position along a single 



dimension and then uses these points (called the ideal points) in a model of the votes. (Two 
lawmakers with the same position will have the same probability of voting in favor of each 
bill.) From roll call data, the ideal point model recovers the familiar division of Democrats 
and Republicans. See Figure [2] for an example. 

Ideal point models can capture the broad political structure of a body of lawmakers, but 
they cannot tell the whole story. We illustrate this with votes on a bill in Figure [3] This 
figure shows lawmaker's ideal points for their votes on an act Recognizing the significant 
accomplishments of AmeriCorps, H.R. 1338 in Congress 111. In this figure, "Yea" votes are 
colored orange, while "Nay" votes are violet; a classic ideal point model predicted that votes 



Example roll call votes 



Lawmaker Item of legislation 



Bill 


S. 3930 


H.R. 5631 


H.R. 6061 


H.R. 5682 


S. 3711 


Mitch McConnell (R) 


Yea 


Yea 


Yea 


Yea 


Yea 


Olympia Snowe (R) 




Yea 


Yea 


Yea 


Nay 


John McCain (R) 


Yea 


Yea 


Yea 


Yea 


Yea 


Patrick Leahy (D) 


Nay 


Yea 


Nay 


Nay 


Nay 


Paul Sarbanes (D) 


Nay 


Yea 


Nay 


Yea 


Nay 


Debbie Stabenow (D) 


Yea 


Yea 


Yea 


Yea 


Yea 



Figure 1: A sample roll-call matrix illustrating lawmakers' votes on items of legislation. 
These votes are from the Senate in the 109th Congress (2005-2006). The party of each 
Senator - (D)emocrat or (R)epublican - is provided in parentheses. This matrix is sometimes 
incomplete (see Snowe's vote on S. 3930, for example). 
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Figure 2: Traditional ideal points separate Republicans (red) from Democrats (blue). 
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Incorrect votes by classic ideal point 
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Incorrect votes by issue 


2 4 

-adjusted ideal point 



Figure 3: Classic ideal points (top) represent votes incorrectly when lawmakers hold issue- 
specific opinions, while issue-adjusted ideal points (bottom) can account for this. Classic 
ideal points assume that lawmakers hold fixed positions, while issue-adjusted ideal points 
allow their positions to change by issue. Each point above is the ideal point of a lawmaker 
voting on an act Recognizing the significant accomplishments of AmeriCorps [and raising 
community service] (H.R. 1338 in Congress 111); orange points represent lawmakers who 
voted "Yea", and violet points represent lawmakers who voted "Nay" on this bill. The 
theory behind classic ideal points assumes that lawmakers' votes on a bill can be described 
by their side of the cut point (black vertical line). Red lines mark lawmakers whose votes 
were incorrectly predicted with each model. 



4 



to the right of the vertical line were "Nay" while those to the left were "Yea" . Out of four 
hundred eight votes on this bill modeled by an ideal point model, thirty-one of these were 
modeled incorrectly 

Sometimes these votes are incorrectly predicted because of stochastic circumstances sur- 
rounding lawmakers and bills. More often, however, these votes can be explained because 
lawmakers are not one- dimensional: they each hold positions on different issues. For example, 
Ronald Paul, a Republican representative from Texas, and Dennis Kucinich, a Democratic 
representative from Ohio, hold consistent political opinions that an ideal point model sys- 
temically gets incorrect. Looking more closely at these errors, we would see that Paul differs 
from a typical Republican when it comes to foreign relations and social issues; Kucinich 
differs from a usual Democrat when it comes to foreign policy. 

The problem is that classical ideal point models place each lawmaker in a single political 
position, but a lawmaker's vote on a bill has to do with a number of factors — her political 
affiliation, the content of the proposed legislation, and her political position on that content. 
While classical ideal point models can capture the main regularities in lawmakers' voting 
behavior, they cannot predict when and how a lawmaker will vote differently than we expect. 

In this paper, we develop the issue- adjusted ideal point model, a model that captures issue- 
specific deviation in lawmaker behavior. We place the lawmakers on a political spectrum and 
identify how they deviate from their position as a function of specific issues. This results in 
inferences like those illustrated in Figure 3. An important component of our model is that we 
use the text of the proposed bills to encode which issues they are about. (We do this through 
a probabilistic topic model (Blei et al. 2003).) Unlike other attempts at developing multi- 



dimensional ideal point models (Jackman 2001), our approach explicitly ties the additional 



dimensions to the political discussion at hand. 

By incorporating issues, we can model the AmeriCorp bill above much better than we 
could with classic ideal points (see Figure [3|. By recognizing that this bill is about social 
services, and by modeling lawmakers' positions on this issue, we are able to predict all but 
one of the lawmakers' votes correctly. This is because we can learn to differentiate between 
lawmakers who are conservative and lawmakers who are conservative on social services. 
For example, the issue-adjusted model tells us that, while Doc Hastings (Republican of 
Washington) is considered more conservative than Timothy Johnson (Republican of Illinois) 
in the ideal point model, Hastings is much more liberal on social issues than Johnson — hence, 
he will more often generally side with Democrats on those votes. 

In the following sections, we describe our model and develop efficient approximate pos- 
terior inference algorithms for computing with it. To handle the scale of the data we want 
to study, we replace the usual MCMC approach with a faster variational inference algo- 
rithm. We then study 12 years of legislative votes from the U.S. House of Representatives 
and Senate, a collection of 1,203,009 votes. We show that our model gives a better fit to 
the data than a classical ideal point model and demonstrate that it provides an interesting 
exploratory tool for analyzing legislative behavior. 



Related work. Item response theory (IRT) has been used for decades in political science 
dClinton et al.||2004[ |Martin and Quinn||2002t |Poole and Rosenthal||l~9~85j ) ; see[Fox| (|2010[ ) for 
Enelow and Hinich (1984) for a historical perspective, and Albert (1992) for 



an overview, 
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Bayesian treatments of the model. Some political scientists have used higher-dimensional 
ideal points, where each legislator is described by a vector of ideal points x u G M. K and 
each bill polarization (i.e., how divisive it is) takes the same dimension K Heckman and 
Snyder (1996). The probability of a lawmaker voting "Yes" is a(x^ad + b, 



d) \we describe 
The principle component of ideal points 
However, other dimensions 



these assumptions further in the next section) 
explains most of the variance and explains party affiliation, 
are not attached to issues, and interpreting beyond the principal component is painstaking 
QJackmanpOOlfl . 

At the minimum, this painstaking analysis often requires careful study of the original 
roll-call votes or study of lawmakers' ideal-point neighbors. The former obviates an IRT 
model, since we cannot make inferences from model parameters alone; while the latter begs 
the question, since it assumes we know in the first place how lawmakers vote on different 
issues. The model we discuss in this paper is intended to address this problem by providing 
interpretable multi-dimensional ideal points. Through posterior inference, we can estimate 
each lawmaker's political position and how it changes on a variety of concrete issues. 

The model we will outline takes advantage of recent advances in content analysis, which 
have received increasing attention because of their ability to incorporate large collections of 
text at a relatively small cost (see Grimmer and Stewart (2012) for an overview of these 



methods). For example, Quinn et al. (2006) used text-based methods to understand how 
legislators' attention was being focused on different issues, to provide empirical evidence 
toward answering a variety of questions in the political science community. 

We will draw heavily on content analytic methods in the machine learning community, 
which has developed useful tools for modeling both text and the behavior of individuals 
toward items. Recent work in this community has provided joint models of legislative text 
and votes. Gerrish and Blei (2011 ) aimed to predict votes on bills which had not yet received 
any votes. This model fitted predictors of each bill's parameters using the bill's text, but 
the underlying voting model was still one- dimensional-it could not model individual votes 
better than a one-dimensional ideal point model. In other work, Wang et al. (2010) developed 
a Bayesian nonparametric model of votes and text over time. Both of these models have 
different purposes from the model presented here; neither addresses individuals' affinity 
toward different types of bills. 

The issue-adjusted model is conceptually more similar to recent models for content rec- 
ommendation. Specifically, Wang and Blei (2011 ) describe a method to recommend academic 



articles to users of a service based on what they have already read, and |Agarwal and Chen 
(2010) proposed a similar model to match users to other items (i.e., Web content). Our 
model is related to these approaches, but it is specifically designed to analyze political data. 
These works, like ours, model users' affinities to items. However, neither of them employ 
the notion of the orientation of an item (i.e., the political orientation of a bill) or that the 
users (i.e., lawmakers) have a position on a this spectrum. These are considerations which 
are required when analyzing political roll call data. 
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2. THE ISSUE-ADJUSTED IDEAL POINT MODEL 



We first review ideal point models of legislative roll call data and discuss their limitations. 
We then present our model, the issue-adjusted ideal point model, that accounts for how 
legislators vote on specific issues. 



2.1. Modeling Political Decisions with Ideal Point Models 



Ideal point models are latent variable models that have become a mainstay in quantitative 
political science. These models are based on item response theory, a statistical theory that 
models how members of a population judge a set of items (see Fox (2010) for an overview). 



Applied to voting records, ideal point models place lawmakers on an interpretable political 
spectrum. They are widely used to help characterize and understand historical legislative 



2002). 



and judicial decisions (Clinton et al. 2004 Poole and Rosenthal 1985 Martin and Quinn 



One-dimensional ideal point models posit an ideal point x u E M. for each lawmaker u. 
Each bill d is characterized by its polarity a d and its popularity b d . (The polarity is often 
called the "discrimination" , and the popularity is often called the "difficulty" ; polarity and 
popularity are more accurate terms.) The probability that lawmaker u votes "Yes" on bill d 
is given by the logistic regression 

P(v u d = yes | x u , a d , b d ) = a(x u a d + b d ), (1) 

where a(s) = exp(s)/(l + exp(s)) is the logistic function. (A probit function is sometimes 
used instead of the logistic. This choice is based on an assumption in the underlying model, 
but it has little empirical effect in legislative ideal point models.) When the popularity of 
a bill b d is high, nearly everyone votes "Yes"; when the popularity is low, nearly everyone 
votes "No". When the popularity is near zero, the probability that a lawmaker votes "Yes" 
is determined primarily by how her ideal point x u interacts with bill polarity a d . 

In Bayesian ideal point modeling, the variables a d , b d , and x u are usually assigned stan- 
dard normal priors ( Clinton et al.||2004 ). Given a matrix of votes v = {v ud }, we can estimate 
the posterior expectation of the ideal point of each lawmaker E [x u \ v). Figure [2] illustrates 
ideal points estimated from votes in the U.S. House of Representatives from 2009-2010. 
The model has clearly separated lawmakers by their political party (color) and provides an 
intuitive measure of their political leanings. 



2.2. Limitations of Ideal Point Models 

The ideal point model fit to the House of Representatives from 2009-2010 correctly models 
98% of all lawmakers' votes on training data. (We correctly model an observed vote if its 
probability under the model is bigger than 1/2.) But it fits some lawmakers better than 
others. It only predicts 83.3% of Baron Hill's (D-IN) votes and 80.0% of Ronald Paul's 
(R-TX) votes. Why is this? 

To understand why, we look at how the ideal point model works. The ideal point model 
assumes that lawmakers are ordered, and that each bill d splits them at a cut point. The 
cut point is a function of the bill's popularity and polarity, —b d /a d . Lawmakers with ideal 
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Ideal point 






► Ideal point 



Health-adjusted 
r ideal point 



Ideal point 



Figure 4: In a traditional ideal point model, lawmakers' ideal points are static. In the 
issue- adjusted ideal point model, lawmakers' ideal points change when they vote on certain 
issues, such as taxation (top panel) and health (bottom panel). A line segment connects 
select lawmakers' ideal points (top row of each panel) to their issue- adjusted ideal points 
(bottom row of each panel). Unlabeled lawmakers are illustrated by the remaining, faint line 
segments. We have colored Democrats blue and Republicans red. 

points x u to one side of the cut point are more likely to support the bill; lawmakers with 
ideal points to the other side are more likely to reject it. The issue with lawmakers like Paul 
and Hill, however, is that this assumption is too strong — their voting behavior does not fit 
neatly into a single ordering. Rather, their location among the other lawmakers changes 
with different bills. 

However, there are still patterns to how they vote. Paul and Hill vote consistently within 
individual areas of policy, such as foreign policy or education, though their voting on these 
issues diverges from their usual position on the political spectrum. In particular, Paul con- 
sistently votes against United States involvement in foreign military engagements, a position 
that contrasts with other Republicans. Hill, a "Blue Dog" Democrat, is a strong supporter 
of second-amendment rights, opposes same-sex adoption, and is wary of government-run 
health care — positions that put him at odds with many other Democrats. Particularly, the 
ideal point model would predict Paul and Hill as having muted positions along the classic 
left-right spectrum, when in fact they have different opinions about certain issues than their 
fellow legislators. 

We refer to voting behavior like this as issue voting. An issue is any federal policy area, 
such as "financial regulation," "foreign policy," "civil liberties," or "education," on which 
lawmakers are expected to take positions. Lawmakers' positions on these issues may diverge 
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from their traditional left/right stances, but traditional ideal point models cannot capture 
this. Our goal is to develop an ideal point model that allows lawmakers to deviate, depending 
on the issue under discussion, from their usual political position. 

Figure [4] illustrates the kinds of hypotheses our model can make. Each panel represents 
an issue; taxation is on the top, and health is on the bottom. Within each panel, the top line 
illustrates the ideal points of various lawmakers — these represent the relative political posi- 
tions of each lawmaker for most issues. The bottom line illustrates the position adjusted for 
the issue at hand. For example, the model posits that Charles Djou (Republican representa- 
tive for Hawaii) is more similar to Republicans on taxation and more similar to Democrats on 
health, while Ronald Paul (Republican representative for Texas) is more Republican-leaning 
on health and less extreme on taxation. Posterior estimates like this give us a window into 
voting behavior that is not available to classic ideal point models. 



2.3. Issue- adjusted Ideal Points 

The issue-adjusted ideal point model is a latent variable model of roll call data. As with the 
classical ideal point model, bills and lawmakers are attached to popularity, polarity, and ideal 
points. In addition, the text of each bill encodes the issues it discusses and, for each vote, 
the ideal points of the lawmakers are adjusted according to those issues. (We obtain issue 



codes from text by using a probabilistic topic model. This is described below in Section 2.5 

In more detail, each bill is associated with a popularity and polarity bd', each lawmaker 
is associated with an ideal point x u . Assume that there are K issues in the political landscape, 
such as finance, taxation, or health care. Each bill contains its text Wd, a collection of observed 
words, from we which we derive a K- vector of issue proportions 0(wd). The issue proportions 
represent how much each bill is about each issue. A bill can be about multiple issues (e.g., a 
bill might be about the tax structure surrounding health care), but these values will sum to 
one. Finally, each lawmaker is associated with a real-valued i^-vector of issue adjustments 
z u . Each component of this vector describes how his or her ideal point changes as a function 
of the issues being discussed. For example, a left-wing lawmaker may be more right wing on 
defense; a right-wing lawmaker may be more left wing on social issues. 

For the vote on bill d, we linearly combine the issue proportions 0(wd) with each law- 
maker's issue adjustment z u to give an adjusted ideal point x u + z^6(wd). The votes are 
then modeled with a logistic regression, 

p(v ud \a d , b d , z u , x u , w d ) = a ((x u + z^6(w d ))a d + b d ) ■ (2) 

We put standard normal priors on the ideal points, polarity, and popularity variables. We 
use Laplace priors for the issue adjustments z u , 

p(z uk | Ai) oc exp (-Ai||z„ fc ||i) . 

Using MAP inference, this finds sparse adjustments. With full Bayesian inference, it finds 
nearly-sparse adjustments. Sparsity is desirable for the issue adjustments because we do 
not expect each lawmaker to adjust her ideal point x u for every issue; rather, the issue 
adjustments are meant to capture the handful of issues on which she does diverge. 

Suppose there are U lawmakers, D bills, and K issues. The generative probabilistic 
process for the issue- adjusted ideal point model is the following. 
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Bill content 
(LDA model) 



Bill polarity & popularity 
Lawmaker Ideal point 




A 



Observed votes 

Lawmaker issue adjustments 



Figure 5: A graphical model for the issue- adjusted ideal point model, which models votes 
v u d from lawmakers and legislative items. Lawmakers' positions are determined by x u and z u , 
a k- vector which interacts with bill-specific issue mixtures (also k- vectors). Issue mixtures 
are fit from text using labeled latent Dirichlet allocation. As with ideal points models, ad 
and bd are bill-specific variables describing the bill's polarization and popularity. 



1. For each user u G {1, . . . , U}: 

(a) Draw ideal points x u ~ A/"(0, 1). 

(b) Draw issue adjustments z u k ~ Laplace(Ai) for each issue k G {1, . . . , K}. 

2. For each bill d G {1,...,D}: 

(a) Draw polarity ~ A/"(0, 1). 

(b) Draw popularity b d ~ A/"(0, 1). 

3. Draw vote v u d from Equation [2] for each user/bill pair, u G {1,...,U} and d G 
{l,...,D}. 

Figure [5] illustrates the graphical model. Given roll call data and bill texts, we can use poste- 
rior expectations to estimate the latent variables. For each lawmaker, these are the expected 
ideal points and per-issue adjustments; these are the posterior estimates we illustrated in 
Figure |4} For each bill, these are the expected polarity and popularity. 

We consider a simple example to better understand this model. Suppose a bill d is 
only about finance. This means that 9(wd) has a one in the finance dimension and zero 
everywhere else. With a classic ideal point model, a lawmaker u's ideal point x u gives his 
position on every bill, regardless of the issue. With the issue-adjusted ideal point model, his 
effective ideal point for this bill is x u + z U)Financc , adjusting his position based on the bill's 
content. The adjustment z MiPinancc might move him to the right or the left, capturing an 
issue-dependent change in his ideal point. 
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In the next section we will describe a posterior inference algorithm that will allow us to 
estimate x u and z u from lawmakers' votes. An eager reader can scan ahead to browse these 
effective ideal points for Ron Paul, Dennis Kucinich, and a handful of other lawmakers in 



Figure [13] This figure shows the posterior mean of issue- adjusted ideal points that have been 
inferred from votes about finance (top) and votes about congressional sessions (bottom). 

In general, a bill might involve several issues; in that case the issue vector 6(wd) will 
include multiple positive components. We have not yet described this important function, 



6{wd)-, which codes a bill with its issues. We describe that function in Section 2.5 First 
we discuss the relationship between the issue adjusted model and other models of political 
science data. 

2.4. Relationship to Other Models of Roll-call Data 

The issue-adjusted ideal point model recovers the classical ideal point model if all of the 
adjustments (for all of the lawmakers) are equal to zero. In that for the classical 

model, each bill cuts the lawmakers at —bd/cid to determine the probabilities of voting "yes." 
With non-zero adjustments, however, the model asserts that the relative positions of law- 
makers can change depending on the issue. Different bill texts, through the coding function 
6(wd), will lead to different orderings of the lawmakers. Again, Figure [4] illustrates these 
re-orderings for idealized bills, i.e., those that are only about taxation or healthcare. 

Issue adjusted models are an interpretable multidimensional ideal point model. In pre- 
vious variants of multidimensional ideal point models, each lawmaker's ideal point x u and 



each bill's polarity are vectors; the probability of a "yes" vote is a(x^ad + bo) (Heckman 



and Snyder 


1996 


Jackman 


2001) 



sion invariably explains most of the variance, separating left-wing and right-wing lawmakers, 
and subsequent dimensions capture other kinds of patterns in voting behavior. Researchers 
developed these models to capture the complexity of politics beyond the left/right divide. 
However, these models are difficult to use because (as for classical factor analysis) the di- 
mensions are not readily interpretable — nothing ties them to concrete issues such as Foreign 



Policy or Defense (Jackman 2001). Our model circumvents the problem of interpreting 
higher dimensions of ideal points. 

The problem is that classical models only analyze the votes. To coherently bring issues 
into the picture, we need to include what the bills are about. Thus, the issue-adjusted model 
is a multidimensional ideal point model where each additional dimension is explicitly tied to 
a political issue. The language of the bills determine which dimensions are "active" when 
modeling the votes. Unlike previous multidimensional ideal point models, we do not posit 
higher dimensions and then hope that they will correspond to known issues. Rather, we 
explicitly model lawmakers' votes on different issues by capturing how the issues in a bill 
relate to deviations from issue-independent voting patterns. 

2.5. Using Labeled LDA to Associate Bills with Issues 

We now describe the issue-encoding function 6. This function takes the language of a bill 
as input and returns a K-vector that represents the proportions with which each issue is 



discussed. In particular, we use labeled latent Dirichlet allocation (Ramage et al. 2009). To 
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Top words in 


selected issues 




Terrorism 


Commemorations Transportation 


Education 


terrorist 


nation 


transportation 


student 


September 


people 


minor 


school 


attack 


life 


print 


university 


nation 


world 


tax 


charter school 


york 


serve 


land 


history 


terrorist attack 


percent 


guard 


nation 


Hezbollah 


community 


coast guard 


child 


national guard 


family 


substitute 


college 



Figure 6: The eight most frequent words from topics fit using labeled LDA (Ramage et al. 



2009). 



use this method, we estimate a set of "topics," i.e., distributions over words, associated with 
an existing taxonomy of political issues. We then estimate the degree to which each bill 
exhibits these topics. This treats the text as a noisy signal of the issues that it encodes, and 
we can use both tagged bills (i.e., bills associated with a set of issues) and untagged bills to 
estimate the model. 

Labeled LDA is a topic model, a model that assumes that our collection of bills can be 
described by a set of themes, and that each bill in this collection is a bag-of-words drawn 
from a mixture of those themes. The themes, called topics, are distributions over a fixed 
vocabulary. In unsupervised LDA — and many other topic models — these themes are fit to 



Blei et al. 


2003 


Blci 


2012 



existing tagging scheme. Each tag is associated with a topic, and its distribution is found by 
taking the empirical distribution of words for documents assigned to that tag, an approach 



heavily influenced by, but simpler than, that of Ramage et al. (2009 ). This gives interpretable 



names (the tags) to the topics. (We note that our method is readily applicable to the fully 
unsupervised case, i.e., for studying a political history with untagged bills. However, such 
analysis requires an additional step of interpreting the topics.) 

We used tags provided by the Congressional Research Service ( CRS|[20T2 ), a service that 
provides subject codes for all bills passing through Congress. These subject codes describe 
the bills using phrases which correspond to traditional issues, such as civil rights and national 
security. Each bill may cover multiple issues, so multiple codes may apply to each bill. (Many 
bills have more than twenty labels.) Figure M illustrates the top words from several of these 



labeled topics. We then performed two iterations of unsupervised LDA ((Blei et al. 2003) 
with variational inference to smooth the word counts in these topics. We used the 74 issues 



in all (the most-frequent issue labels); we summarize all 74 of them in Appendix B.l 

With topics in hand, we model each bill with a mixed-membership model: Each bill is 
drawn from a mixture of the topics, but each one exhibits them with different proportions. 
Denote the K topics by P\ : k and let a be a vector of Dirichlet parameters. The generative 
process for each bill al is: 



1. Choose topic proportions 9a ~ Dirichlet (a). 
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2. For each word n 6 {1, . . . , N}: 



(a) Choose a topic assignment z^ n ~ #<i- 

(b) Choose a word ~ f3 Zd n . 

The function 8(w d ) is the posterior expectation of 9 d - It represents the degree to which the 
bill exhibits the K topics, where those topics are explicitly tied to political issues through 
the congressional codes, and it is estimated using variational inference at the document 



level (Blei et al. 2003). The topic modeling portion of the model is illustrated on the left 
hand side of the graphical model in Figure [5] 

We have completed our specification of the model. Given roll call data and bill texts, 
we first compute the issue vectors for each bill. We then use these in the issue- adjusted 
ideal point model of Figure [5] to infer each legislator's posterior ideal point and per-issue 
adjustment. We now turn to the central computational problem for this model, posterior 
inference. 



3. POSTERIOR ESTIMATION 

Given roll call data and an encoding of the bills to issues, we form inferences and predictions 
through the posterior distribution of the latent ideal points, issue adjustments, and bill vari- 
ables, p(x, z,a,b\v,0). In the next section, we inspect this posterior to explore lawmakers' 
positions about specific issues. 

As for most interesting Bayesian models, this posterior is not tractable to compute; 
we must approximate it. Approximate posterior inference for Bayesian ideal point models 



is usually performed with MCMC methods, such as Gibbs sampling (Johnson and Albert 



1999 Jackman 2001 Martin and Quinn 2002 Clinton et al. 2004). Here we will develop an 



alternative algorithm based on variational inference. Variational inference tends to be faster 
than MCMC, can handle larger data sets, and is attractive when fast Gibbs updates are not 
available. In the next section, we will use variational inference to analyze twelve years of roll 
call data. 

3.1. Mean-field Variational Inference 

In variational inference we select a simplified family of candidate distributions over the 
latent variables and then find the member of that family which is closest in KL divergence 



to the posterior of interest (Jordan et al. 1999; Wainwright and Jordan 2008). This turns 



the problem of posterior inference into an optimization problem. For posterior inference in 
the issue- adjusted model, we use the fully-factorized family of distributions over the latent 
variables, i.e., the mean-field family, 



q(x,y,z,a,b\rj) = J\f(x u \x u , al)J\f(z u \z u , al) Af(a d \a d , a 2 a )N(b d \b d , al) . (3) 



u 



D 



This family is indexed by the variational parameters rj = {(x u ,a x ), (z u ,a Zu ), (a,a a ), (6, at)}, 
which specify the means and variances of the random variables in the variational posterior. 
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While the model specifies priors over the latent variables, in the variational family each 
instance of each latent variable, such as each lawmaker's issue adjustment for Taxation, is 
endowed with its own variational distribution. This lets us capture data-specific marginals — 
for example, that one lawmaker is more conservative about Taxation while another is more 
liberal. 

We fit the variational parameters to minimize the KL divergence between the variational 
posterior and the true posterior. Once fit, we can use the variational means to form predic- 
tions and posterior descriptive statistics of the lawmakers' issue adjustments. In ideal point 
models, the means of a variational distribution can be excellent proxies for those of the true 



posterior (Gerrish and Blei 2011)). 



3.2. The Variational Objective 

Variational inference proceeds by taking the fully-factorized distribution (Equation [3]) and 
successively updating the parameters r) to minimize the KL divergence between the varia- 
tional distribution (Equation |3]) and the true posterior: 



f] = arg min KL (q v (x, z, a, b) \ \p(x, a, a, b\v)) (4) 
v 

This optimization is usually reformulated as the problem of maximizing a lower bound (found 
via Jensen's inequality) on the marginal probability of the observations: 

p(v) = / p(x, z, a,b,v)dxdzdadb 

Jrj 

^ / ( U\^ P( X ' Z > a > & ' W ) A A A Ah 

> / q v yx, z, a, 0) log — —dxdzdadb 

J v q v {x,z,a,b) 

=E q [p(x, z, a, b, v)] - E q [q v (x, z, a, b)] = C v . (5) 



We follow the example of Braun and McAuliffe (2010) by referring to the lower bound C v as 



the evidence lower bound (ELBO). 

For many models, the ELBO can be expanded as a closed-form function of the varia- 
tional parameters and then optimized with gradient ascent or coordinate ascent. However, 
the issue-adjusted ideal point model does not allow for a closed-form objective. Previous 



research on such non-conjugate models overcomes this by approximating the ELBO (Braun 



and McAuliffe 2010; Gerrish and Blei 2011). Such methods are effective, but they require 
many model-specific algebraic tricks and tedious derivations. Here we take an alternative 
approach, where we approximate the gradient of the ELBO with Monte-Carlo integration 
and perform stochastic gradient ascent with this approximation. This gave us an easier way 
to fit the variational objective for our complex model. 

3.3. Optimizing the Variational Objective with a Stochastic Gradient 

We begin by computing the gradient of the ELBO in Equation [5} We rewrite it in terms 
of integrals, then exchange the order of integration and differentiation, and apply the chain 
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rule: 



VC V = V 



q v (x, z, a, b)(logp(x, z, a, b, v) — log g T? (x, 2, a, b))dx 



(6) 



dx 



= Jv q v (x,z,a,b)(logp(x,z,a,b,v) - log q v (x,z, a, b)) 
— J z, a, b)(logp(x, z, a, b, v) — logq v {x, 2, a, b)) — q v (x, z, a, b)V logq v dx. 

Above we have assumed that the support of q^ is not a function of 77, and that log <2V,(x, 2, a, b) 
and V logg^x, 2, a, 6) are continuous with respect to 77. 

We can rewrite Equation [6] as an expectation by using the identity q v (x)V log q v (x) = 
Vq v {x): 



VC V = E q [V log q v (x, 2, a, b) (logp(x,z,a,b,v) - log q n (x, 2, a, b) - 1)] . 



(7) 



Next we use Monte Carlo integration to form an unbiased estimate of the gradient at 77 = r] . 
We obtain M iid samples (xi, . . . , xm, ■ ■ ■ ,b%, . . . , 6m) from the variational distribution q Vo 
for the approximation 



1 no 

M 



it , / V log Q'jj (x m , 2 m , (3 m , b m ) (logp(x m ,2 m 

y) - log )-C). 

M — ' rjo 



m=l 



We denote this approximation V^r,!^. Note we replaced the 1 in Equation [7j with a con- 
stant C, which does not affect the expected value of the gradient (this follows because 
Eg [V log q v (x, 2, a, b)] = 0). We discuss in the supplementary materials how to set C to min- 



imize variance. Related estimates of similar gradients have been studied in recent work (Car- 



bonetto et al. 



maximization (Wei and Tanner 1990). 



2009; Graves 2011 Paisley et al. 2012) and in the context of expectation 



Using this method for finding an approximate gradient, we optimize the ELBO with 



stochastic optimization (Robbins and Monro 1951 Spall 2003 Bottou and Cun 2004). 



Stochastic optimization follows noisy estimates of the gradient with a decreasing step-size. 
While stochastic optimization alone is sufficient to achieve convergence, it may take a long 
time to converge. To improve convergence rates, we used two additional ideas: quasi-Monte 
Carlo samples (which minimize variance) and second-order updates (which eliminate the 
need to select an optimization parameter). We provide details of these improvements in the 
appendix. 

Let us return briefly to the problem that motivated this section. Our goal is to estimate 
the mean of the hidden random variables — such as lawmakers' issue adjustments 2 — from 
their votes on bills. We achieved this by variational Bayes, which amounts to maximizing 
the ELBO (Equation [5]) with respect to the variational parameters. This maximization is 
achieved with stochastic optimization on Equation [8] In the next section we will empirically 
study these inferred variables (i.e., the expectations induced by the variational distribution) 
to better understand distinctive voting behavior. 
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4. ISSUE ADJUSTMENTS IN THE UNITED STATES CONGRESS 



We used the issue-adjusted ideal point model to study the complete roll call record from the 
United States Senate and House of Representatives during the years 1999-2010. We report 
on this study in this and the next section. We first evaluate the model fitness to this data, 
confirming that issue-adjustments give a better model of roll call data and that the encoding 
of bills to issues is responsible for the improvement. We then use our inferences to give a 
qualitative look at U.S. lawmakers' issue preferences, demonstrating how to use our richer 
model of lawmaker behavior to explore a political history. 



4.1. The United States Congress from 1999-2010 

We studied U.S. Senate and House of Representative roll-call votes from 1999 to 2010. This 
period spanned Congresses 106 to 111, the majority of which Republican President George 
W. Bush held office. Bush's inauguration and the attacks of September 11th, 2001 marked 
the first quarter of this period, followed by the wars in Iraq and Afghanistan. Democrats 
gained a significant share of seats from 2007 to 2010, taking the majority from Republicans 
in both the House and the Senate. Democratic President Barack Obama was inaugurated 
in January 2009. 

The roll-call votes are recorded when at least one lawmaker wants an explicit record of the 
votes on the bill. For a lawmaker, such records are useful to demonstrate his or her positions 
on issues. Roll calls serve as an incontrovertible record for any lawmaker who wants one. 
We downloaded both roll-call tables and bills from www.govtrack.us, a nonpartisan website 
which provides records of U.S. Congressional voting. Not all bill texts were available, and 
we ignored votes on bills that did not receive a roll call, but we had over one hundred for 
each Congress. Table [7] summarizes the statistics of our data. 

We fit our models to two-year periods in the House and (separately) to two-year periods 
in the Senate. Some bills received votes in both the House and Senate; in those cases, the 
issue- adjusted model's treatment of the bill in the House was completely independent of its 
treatment by the model in the Senate. 



Vocabulary. To fit the labeled topic model to each bill, we represented each bill as a 
vector of phrase counts. This "bag of phrases" is similar to the "bag of words" assumption 
commonly used in natural language processing. To select this vocabulary, we considered 
all phrases of length one word to five words. We then omitted content-free phrases such as 
"and", "when", and "to the". The full vocabulary consisted of 5,000 n-grams (further details 
of vocabulary selection are in Appendix B.2 ). We used these phrases to algorithmically define 



topics and assign issue weights to bills as described in Section 2.5 



Identification. When using ideal-point models for interpretation, we must address the 
issue of identification. The signs of ideal points x u and bill polarities are arbitrary, 



for example, because x n a^ = (— x u )(— a^). This leads to a multimodal posterior (Jackman 



2001). We address this by flipping ideal points and bill polarities if necessary to follow the 



convention that Republicans are generally on the right (positive on the line) and Democrats 
are generally on the left (negative on the line). 
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Figure 7: Roll-call data sets used in the experiments. These counts include votes in both 
the House and Senate. Congress 107 had fewer votes than the remaining congresses in 
part because this period included large shifts in party power, in addition to the attacks on 
September 11th, 2001. The number of lawmakers within each House and Senate varies by 
congress because there was some turnover within each Congress. In addition, some lawmakers 
never voted on legislation in our experiments (recall, we used legislation for which both text 
was available and for which the roll-call was recorded). 



Statistics for the U.S. Senate 



Congress 


Years 


Lawmakers 


Bills 


Votes 


106 


1999-2000 


81 


101 


7,612 


107 


2001-2002 


78 


76 


5,547 


108 


2003-2004 


101 


83 


7,830 


109 


2005-2006 


102 


74 


7,071 


110 


2007-2008 


103 


97 


9,019 


111 


2009-2010 


110 


62 


5,936 



Statistics for the U.S. House of Representatives 



Congress 


Years 


Lawmakers 


Bills 


Votes 


106 


1999-2000 


437 


345 


142,623 


107 


2001-2002 


61 


360 


18,449 


108 


2003-2004 


440 


490 


200,154 


109 


2005-2006 


441 


458 


187,067 


110 


2007-2008 


449 


705 


287,645 


111 


2009-2010 


446 


810 


330,956 
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4.2. Ideal Point Models vs. Issue- adjusted Ideal Point Models 



The issue-adjusted ideal point model in Equation [2] is a generalization of the traditional 
ideal point model (see Section 2.4). Before using this more complicated model to explore 
our data, we empirically justify this increased complexity. We first outline empirical differ- 
ences between issue- adjusted ideal points and traditional idea points. We then report on a 
quantitative validation of the issue-adjusted model. 



Examples: adjusting for issues. To give a sense of how the issue-adjusted ideal point 
model works, Table [8] gives a side-by-side comparison of traditional ideal points x u and issue- 
adjusted ideal points (x u +z^0) for the ten most-improved bills of Congress 111 (2009-2010). 
For each bill, the top row shows the ideal points of lawmakers who voted "Yea" on the bill 
and the bottom row shows lawmakers who voted "Nay". The top and bottom rows are a 
partition of votes rather than separate treatments of the same votes. In a good model of 
roll call data, these two sets of points will be separated, and the model can place the bill 
parameters at the correct cut point. Over the whole data set, the cut point of the votes 
improved in 14,347 heldout votes. (It got worse in 8,304 votes and stayed the same in 5.7M.) 



Comparing issue-adjusted ideal points to traditional ideal points. The traditional 
ideal point model (Equation [T]) uses one variable per lawmaker, the ideal point x u , to explain 
all of her voting behavior. In contrast, the issue-adjusted model (Equation [2]) uses x u along 
with K issue adjustments. Here we ask, how does does x u under these two models differ? 
We fit ideal points to the 111th House (2009 to 2010) and issue-adjusted ideal points to the 
same period with regularization A = 1. 

The top panel of Figure |4.2| compares the classical ideal points to the global ideal points 
from the issue-adjusted model. In this parallel plot, the top axis of this represents a law- 
maker's ideal point x u under the classical model, while the bottom axis represents his global 
ideal point under the issue-adjusted model. (We will use plots like this again in this paper. 
It is called a parallel plot, and it compares separate treatments of lawmakers. Lines between 
the same lawmakers under different treatment are shaded based on their deviation from a 
linear model to highlight unique lawmakers.) The ideal points in Figure 4.2 are similar; their 
correlation coefficient is 0.998. The most noteworthy difference is that lawmakers appear 
more partisan under the traditional ideal point model — enough that Democrats are com- 
pletely separated from Republicans by x u — while issue- adjusted ideal points provide a softer 
split. 

This is not surprising, because the issue-adjusted model is able to use lawmakers' ad- 
justments to explain their votes. In fact, the political parties are better separated with issue 
adjustments than they are by ideal points alone. We checked this by writing each lawmaker 
u as the vector w u := (x u , z Ut \, . . . , z Ui k) and performing linear discriminant analysis to find 
that vector (3 which "best" separates lawmakers by party along w^(3. 

We illustrate lawmakers' projections w^ft along the discriminant vector /3 in the bottom 
figure of Figure 4.2| (we normalized variance of these projections to match that of the ideal 
points). The correlation coefficient between this prediction and political party is 0.979, much 
higher than the correlation between ideal points x u and political party (0.921). 



18 



Bill description 


Votes by ideal point 


Votes by adjusted point 


H. Res 806 (amending an 
education / environment 
trust fund) 


: » mm 


+ 


m*mmw * — 


Providing for con- 
ditional adjourn- 
ment / recess of Congress 






+ 




+ 












Establish R&D pro- 
gram for gas turbines 






+ 




c 


1 








Recognizing Ameri- 
Corps and community 
service 




+ 




+ 


Providing for condi- 
tional adjournment of 
Congress 




L 


+ 


1 


MMW 








Providing for the sine 
die adjournment of 
Congress 






+ 




+ 


1— 








Providing for an ad- 
journment / recess of 
Congress 




+ 




1 + 










Preventing child mar- 
riage in developing 
countries 


• 


+ 


f 






Providing for a condi- 
tional House adjourn- 
ment 






+ 




+ 










Congratulating UMD 
Men's basketball 


-2 






-2 




+ 


c 


) 


2 4 


2 4 



Figure 8: Issue-adjusted ideal points can explain votes better than standard ideal points. 
The x-axis of each small plot shows ideal point or issue- adjusted ideal point for a lawmaker. 
Each bill's indifference point —bdjdd is shown as a vertical line. Positive votes (orange) and 
negative votes (purple) are better-divided by issue-adjusted ideal points. 
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Traditional ideal point 

Un-adjusted ideal point 
in issue-adjusted model 



Traditional ideal point 

Separating vector 
in issue-adjusted model 



Figure 9: Classic issue-adjusted ideal points x u (top row, both figures) separate lawmakers 
by party better than un-adjusted ideal points x u from the issue-adjusted model (bottom 
row, top figure). The issue-adjusted model can still separate Republicans from Democrats 
better than the ideal point model along a separating vector (bottom row, bottom figure). 
In each figure, Republicans are colored red, and Democrats are blue. These ideal points 
were estimated in the 111th House of Representatives. The line connecting ideal points from 
each model has opacity proportional to the squared residuals in a linear model fit to predict 
issue-adjusted ideal points from ideal points. The separating vector was defined using linear 
discriminant analysis. 



To be sure, some of this can be explained by random variation in the additional 74 
dimensions. To check the extent of this improvement due only to dimension, we draw 
random issue adjustments from normal random variables with the same variance as the 
empirically observed issue adjustments. In 100 tests like this, the correlation coefficient was 
higher than for classical ideal points, but not by much: 0.933 ± 0.004. Thus, the posterior 
issue adjustments provide a signal for separating the political parties better than ideal points 



alone. In fact, we will see in Section 5.2 that procedural votes driven by political ideology is 



one of the factors driving this improvement. 

Changes in bills' parameters. Bills' polarity ad and popularity bd are similar under 
both the traditional ideal point model and the issue-adjusted model. We illustrate bills' 



parameters in these two models in Figure 10 and note some exceptions. 



First, procedural bills stand out from other bills in becoming more popular overall. In 
Figure [TUJ procedural bills have been separated from traditional ideal points. We attribute 
the difference in procedural bills' parameters to procedural cartel theory, which we describe 
further in Section IS~2l 

The remaining bills have also become less popular but more polarized under the issue- 
adjusted model. This is because the issue-adjusted model represents the interaction between 
lawmakers and bills with K additional bill-specific variables, all of which are mediated by 
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Figure 10: Procedural bills are more popular under the issue- adjusted voting model. Top: 
popularity bd of procedural bills under the issue-adjusted voting model is greater than with 



traditional ideal points. Bottom: consistent with Cox and Poole (2002) and procedural cartel 



theory, the polarity of procedural bills is generally more extreme than that of non-procedural 
bills. However, issue adjustments lead to increased polarity (i.e., certainty) among non- 
procedural votes as well. The procedural issues include congressional reporting requirements, 
government operations and politics, House of Representatives, House rules and procedure, 
legislative rules and procedure, and Congress. 



the bill's polarity. This means that the the model is able to depend more on bills' polarities 
than bills' popularities to explain votes. For example, Donald Young regularly voted against 
honorary names for regional post offices. These bills — usually very popular — would have 
high popularity under the ideal point model. The issue- adjusted model also assigns high 
popularity to these bills, but it takes advantage of lawmaker's positions on the postal facilities 
issue to explain votes, decreasing reliance on the bill's popularity (postal facilities was more 
common than 50% of other issues, including human rights, finance, and terrorism) . 



4.3. Evaluation of the Predictive Distribution 

We have described the qualitative differences between the issue-adjusted model and the 
traditional ideal point model. We now turn to a quantitative evaluation: Does the issue- 
adjusted model give a better fit to legislative data? 

We answer this question via cross validation and the predictive distribution of votes. For 
each session, we divide the votes, i.e., individual lawmaker/bill pairs, into folds. For each 
fold, we hold out the votes assigned to it, fit our models to the remaining votes, and then 
evaluate the log probability of the held out votes under the predictive distribution. A better 
model will assign higher probability to the held-out data. We compared several methods: 

1. The issue- adjusted ideal point model with topics found by labeled LDA: This is the 
model and algorithm described above. We used a regularization parameter A = 1. (See 



Appendix A. 3 for a study of the effect of regularization. 



The issue-adjusted ideal point model with explicit labels on the bills: Rather than infer 
topics with labeled LDA, we used the CRS labels explicitly. If a bill contains J labels, 
we gave it weight 1/J at each of the corresponding components of the topic vector 8. 



3. The traditional ideal point model of Clinton et al. (2004): This model makes no ref- 



erence to issues. To manage the scale of the data, and keep the comparison fair, we 
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Figure 11: Average log-likelihood of heldout votes across all sessions for the House and 
Senate. Log-likelihood was averaged across folds using six-fold cross validation for Congresses 
106 to 111 (1999-2010) with regularization A = 1. The variational distribution had higher 
heldout log-likelihood for all congresses in both chambers than either 
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used variational inference. (In |Gerrish and Blei (2011), we showed that variational 



approximations find as good approximate posteriors as MCMC in ideal point models. 

4. A permuted issue- adjusted model: Here, we selected a random permutation n G Sd, 
to shuffle the D topic vectors 6 d — >■ O^a) an d fit the issue-adjusted model with the per- 
muted vectors. This permutation test removes the information contained in matching 
bills to issues, though it maintains the same empirical distribution over topic mixtures. 
It can indicate that improvement we see over traditional ideal points is due to the bills' 
topics, not due to spurious factors (such as the change in dimension). In this method 
we used five random permutations. 



We summarize the results in Table 11 In all chambers in both Congresses, the issue- 
adjusted model represents heldout votes with higher log-likelihood than an ideal point 
model. Further, every permutation represented votes with lower log-likelihood than the 
issue-adjusted model. In most cases they were also lower than an ideal point model. These 
tables validate the additional complexity of the issue-adjusted ideal point model. 



5. EXPLORING ISSUES AND LAWMAKERS 

In the previous section, we demonstrated that the issue-adjusted IPM gives a better fit to 
roll call data than the traditional ideal point model. While we used prediction to validate 
the model, we emphasize that it is primarily an exploratory tool. As for the traditional ideal 
point model, it is useful for summarizing and characterizing roll call data. In this section, we 
demonstrate how to use the approximate posterior to explore a collection of bills, lawmakers, 
and votes through the lens of the issue- adjusted model. 
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We will focus on the 111th Congress (2009-2010). First, we show on which issues the issue- 
adjusted model best fits. We then discuss several specific lawmakers, showing voting patterns 
that identify lawmakers who transcend their party lines. We finally describe procedural cartel 



theory (Cox and Poole 2002), which explains why certain lawmakers have such different 
preferences on procedural issues like congressional sessions than substantive issues like as 
finance. 

5.1. Issues Improved by Issue Adjustment 

Which issues give the issue-adjusted model an edge over the traditional model? We measured 
this with a metric we will refer to as issue improvement. Issue improvement is the weighted 
improvement in log likelihood for the issue-adjusted model relative to the traditional model. 
We formalize this by defining the log likelihood of each lawmaker's vote 

Jud = l{ Vud =yes}P ~ log(l + exp(p)), (9) 

where p = (x u + z^O d )a d + 6 rf is the log-odds of a vote under the issue- adjusted voting 
model. We also measure the corresponding log-likelihood I ud under the ideal point model, 
using p = x u ad + bd- The improvement of issue k is then the sum of the improvement in 
log-likelihood, weighted by how much each vote represents issue k: 

Imp, = ^^^"H (10) 

A high value of Imp fc indicates that issue k is associated with an increase in log-likelihood, 
while a low value is associated with a decrease in log-likelihood. 

We measured this for each issue in the 111th House. As this was an empirical question 
about the entire House, we fit the model to all votes (in contrast to the analysis above, which 
fit the model to five out of six folds, for each of the six folds). 



We illustrate Imp fc for a all issues in Figure 12 All issues increased log-likelihood; those 
associated with the greatest increase tended to be related to procedural votes. For example, 
women, religion, and military personnel issues are nearly unaffected by lawmakers' offsets. 
For those issues, a global political spectrum (i.e., a single dimension) capably explains the 
lawmakers' positions. 

5.2. Exploring the Issue Adjustments 

The purpose of our model is to adjust lawmaker's ideal points according to the issues under 
discussion. In this section, we demonstrate a number of ways to explore this information. 

We begin with a brief summary of the main information obtained by this model. During 
posterior inference, we jointly estimate the mean x u , z u of all lawmakers' positions lawmakers' 
issue- adjusted ideal points. These issue adjustments z u adjust how we expect lawmakers 
to vote given their un-adjusted ideal points x. We illustrate this for finance (a substantive 



issue) and congressional sessions (a procedural issue) in Figure 13 and summarize all issues in 



Figure 14 Upon a cursory inspection, it is clear that in some issues, lawmakers' adjustments 
are relatively sparse, i.e. only a few lawmakers' adjustments are interesting. In other issues — 
such as the procedural issue congressional sessions — these adjustments are more systemic. 
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Figure 12: Log-likelihood increases when using adjusted ideal points most for procedural 
and strategic votes and less for issues frequently discussed during elections. Imp fc is shown 
on the x-axis, while issues are spread on the y-axis for display. The size of each issue k is 
proportional to the logarithm of the weighted sum J2 Vud ®dk of votes about the issue. 
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Figure 13: Ideal points x u and issue- adjusted ideal points x u + z u k from the 111th House for 
the substantive issue finance and the procedural issue congressional sessions. Democrats are 
blue and Republicans are red. Votes about finance and congressional sessions were better 
fit using issue-adjusted ideal points. For procedural votes such as congressional sessions, 
lawmakers become more polarized by political party, behavior predicted by procedural cartel 
theory (ICox and McCubbins||1993|). 



Adjustments by issue and party. Figure 15 illustrates the distribution across lawmak- 
ers of the posterior issue adjustments (denoted z u k) for issues with the highest and lowest 
variance. This figure shows the distribution for the four issues with the greatest variation in 
z u k (across lawmakers) and the four issues with the least variation. Note the systematic bias 
in Democrats' and Republicans' issue preferences: they become more partisan on certain 
issues, particularly procedural ones. 



Controlling for ideal points. We found that posterior issue adjustments can correlate 
with the ideal point of the lawmaker — for example, a typical Republican tends to have a 
Republican offset on taxation. In some settings, we are more interested in understanding 
when a Republican deviates from behavior suggested by her ideal point. We can shed light 
on this systemic issue bias by explicitly controlling for it. To do this, we fit a regression for 
each issue k to explain away the effect of a lawmaker's ideal point x u on her offset z u k- 

z k = f3 k X + e, 

where ftk G M. Instead of evaluating a lawmaker's observed offsets, we use her residual 
z u k = z u k — PkXu, which we call the corrected issue adjustment. By doing this, we can 
evaluate lawmakers in the context of other lawmakers who share the same ideal points: a 
positive offset z u k for a Democrat means she tends to vote more conservatively about issue 
k than others with the same ideal point (most of whom are Democrats). 
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Figure 14: Ideal points x u and issue- adjusted ideal points x u + z u k from the 111th House for 
all issues. 
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Figure 15: Histogram of issue adjustments for selected issues. Democrats are in the left 
column, and Republicans are in the right column. Both Democrats and Republicans tend 
to have small issue adjustments for traditional issues. Their issue adjustments differ sub- 
stantially for procedural issues. A more-dispersed distribution of issue adjustments does 
not mean that these lawmakers tend to feel differently from one another about these issues. 
Instead, it means that lawmakers deviate from their ideal points more. 
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Figure 16: Significant issue adjustments for exceptional senators in Congress 111. Each 
illustrated issue is significant to p < 0.05 by a permutation test. 



Most issues had only a moderate relationship to ideal points. House rules and procedure 
was the most-correlated with ideal points, moving the adjusted ideal point (3^ = 0.26 right 
for every unit increase in ideal point, public land and natural resources and taxation followed 
at a distance, moving an ideal point 0.04 and 0.025 respectively with each unit increase in 
ideal point, health, on the other hand, moved lawmakers /3 k = 0.04 left for every unit increase 
in ideal point. At the other end of the spectrum, the issues women, religion, and military 
personnel were nearly unaffected by lawmakers' offsets. 



Extreme lawmakers. We next use these corrected issue adjustments to identify lawmak- 
ers' exceptional issue preferences. To identify adjustments which are significant, we turn 
again to the same nonparametric check described in the last section: permute issue vectors' 
document labels, i.e. . . . , 6 D ) \-t (6^(1) ■ ■ ■ 0^ (£>))> an d refit lawmakers' adjustments us- 
ing both the original issue vectors and permuted issue vectors, for permutations 7Ti, . . . , 7r 2 o. 
We then compare a corrected issue adjustment z u k's absolute value with corrected issue 
adjustments estimated with permuted issue vectors O^.r^k- This provides a nonparametric 
method for finding issue adjustments which are more extreme than expected by chance: an 
extreme issue adjustment has a greater absolute value than all of its permuted counterparts. 
We use these to discuss several unique lawmakers. 

Using corrected issue adjustments, we identified several of the most-unique lawmakers. 
We focused this analysis on votes from 2009-2010, the most recent full session of Congress, 
using A = 1. We fit the variational approximation to all votes in the House and computed 
lawmakers' corrected issue adjustments z u k, which are conditioned on their ideal points as 



described in Section 5.2 Figure 16 illustrates those issue preferences which were significant 



28 



under 20 permutation replications (p < 0.05) for several lawmakers from this Congress. 



Ron Paul. We return to Ron Paul, one of the most unique House Republicans, and a 
lawmaker who first motivated this analysis. Paul's offsets were very extreme; he tended 
to vote more conservatively than expected on health, human rights and international 
affairs. He voted more liberally on social issues such as racial and ethnic relations, and 
broke with behavior expected under a procedural cartel (congressional sessions). The 
issue-adjusted training accuracy of Paul's votes increased from 83.8% to 87.9% with 
issue offsets, placing him among the two most-improved lawmakers with this model. 



The issue-adjusted improvement Imp K (Equation 10) when restricted to Paul's votes 
indicate significant improvement in international affairs and East Asia (he tends votes 
against U.S. involvement in foreign countries); congressional sessions; human rights; 
and special months (he tends to vote against recognition of months as special holidays). 

Donald Young. One of the most exceptional legislators in the 111th House was 
Donald Young, Alaska Republican. Young stood out most in a topic used frequently 
in House bills about naming local landmarks. In many cases, Young voted against the 
majority of his party (and the House in general) on a series of largely symbolic bills 
and resolutions. For example, in the commemorative events and holidays topic, Young 
voted (with only two other Republicans and against the majority of the House) not 
to commend "the members of the Agri-business Development Teams of the National 
Guard and the National Guard Bureau for their efforts... to modernize agriculture 
practices and increase food production in war-torn countries." 

Young's divergent symbolic voting was also evident in a series of votes against naming 
various landmarks — such as post offices — in a topic about such symbolic votes. Yet 
Donald Young's ideal point is -0.35, which is not particularly distinctive (see Figure [2]): 
using the ideal point alone, we would not recognize his unique voting behavior. 



Procedural Cartels. Above we briefly noted that Democrats and Republicans become 
more partisan on procedural issues. Lawmakers' more partisan voting on procedural issues 
can be explained by theories about partisan strategy in the House. In this section we 
summarize a theory underlying this behavior and note several ways in which it is supported 
by issue adjustments. 

The sharp contrast in voting patterns between procedural votes and substantive votes 
has been noted and studied over the past century ( Jr7||1965 Jones||1964 Cox and McCubbins 



1993] |Cox and Poole|[2002| ) . |Cox and McCubbins| Q1993D provide a summary of this behavior: 
"parties in the House — especially the majority party — are a species of 'legislative cartel' [ 
which usurp the power ] to make rules governing the structure and process of legislation." A 



defining assumption made by Cox and McCubbins (2005) is that the majority party delegates 



an agenda-setting monopoly to senior partners in the party, who set the procedural agenda 
in the House. As a result, the cartel ensures that senior members hold agenda-setting seats 
(such as committee chairs) while rank-and-file members of the party support agenda-setting 
decisions. 

This procedural cartel theory has withstood tests in which metrics of polarity were found 
to be greater on procedural votes than substantive votes ( Cox and McCubbins|1993 Cox and 



[ 
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] 



Poole|[2002j |Cox and McCubbinsf2005j ). We note that issue adjustments support this theory 



in several ways. First, lawmakers' systematic bias for procedural issues was illustrated and 
discussed in Section 5.2 (see Figure 15): Democrats systematically lean left on procedural 
issues, while Republicans systematically lean right. Importantly, this discrepancy is more 
pronounced among procedural issues than substantive ones. Second, lawmakers' positions 
on procedural issues are more partisan than expected under the underlying un-adjusted ideal 
points (see Section 5.2 and Figure [l3|. F inally, more extreme polarity and improved predic- 
tion on procedural votes (see Section |4~3 and Figure 10) indicate that that issue adjustments 
for procedural votes are associated with more extreme party affiliation — also observed by 



Cox and Poole (2002). 



6. SUMMARY 

We developed and studied the issue- adjusted ideal point model, a model designed to tease 
apart lawmakers' preferences from their general political position. This is a model of roll- 
call data that captures how lawmakers vary, issue by issue. It gives a new way to explore 
legislative data. On a large data set of legislative history, we demonstrated that it is able to 
represent votes better than a classic ideal point model and illustrated its use as an exploratory 
tool. 

This work could be extended in several ways. One of the most natural way is to incorpo- 
rate lawmakers' stated positions on issues - which may differ from how they actually vote on 
these issues; in preliminary analyses, we have found little correlation to external sources. We 
might also study lawmakers' activities outside of voting to understand their issue positions. 
For example, lawmakers' fund-raising by industry area might (or might not) be useful in 
predicting their positions on different issues. Additional work includes modeling how law- 
makers' positions on issues change over time, by incorporating time-series assumptions as in 



Martin and Quinn (2002). 
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APPENDIX A. POSTERIOR INFERENCE 



In this appendix we provide additional details for A Textual Issue Model for Legislative Roll 
Calls. We begin by detailing the inference algorithm summarized in Section |3j 

A.l. Optimizing the variational objective 

Variational bounds are typically optimized by gradient ascent or block coordinate ascent, 
iterating through the variational parameters and updating them until the relative increase 
in the lower bound is below a specified threshold. Traditionally this would require symbolic 
expansion of the ELBO C v = K q \p(x, v, z, 0, a, b) — q v (x, v, z, a, b)], so that the bound can 
be optimized with respect to the variational parameters 77. This expectation cannot be 
analytically expanded with our model. One solution would be to approximate this bound. 
Especially when there are many variables, however, this approximation and the resulting 
optimization algorithm are complicated and prone to bugs. 

Instead of expanding this bound symbolically, we update each parameter with stochas- 
tic optimization. We repeat these updates for each parameter until the parameters have 
converged. Upon convergence, we use the variational means x and z to inspect lawmakers' 
issues and bill parameters a and b to inspect items of legislation. 

Without loss of generality, we describe how to perform the mth update on the variational 
parameter x, assuming that we have the most-recent estimates of the variational parameters 
z n -i, a n -i> an d frn-i- To motivate the inference algorithm, we first approximate the 
ELBO C with a Taylor approximation, which we optimize. At the optimum, the Taylor 
approximation is equal to the ELBO. 

Writing the variational objective as C(x) = KL(^||p) for notational convenience (where 
all parameters in 77 except x are held fixed), we estimate the KL divergence as a function of 
x around our last estimate x m _i with its Taylor approximation 
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where Ax = x — x n _i. Once we have estimated the Taylor coefficients (as described in the 
next paragraph), we can perform the update 
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We approximated the Taylor coefficients with Monte Carlo sampling. Without loss of 
generality, we will illustrate this approximation with the variational parameter x. Let x n _i 
be the current estimates of the variational mean, q s . n _ 1 (x, z, a, b) be the variational posterior 
at this mean, and Cx n _ 1 be the ELBO at this mean. We then approximate the gradient with 
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Monte Carlo samples as 
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where we have taken the gradient through the integral using Leibniz's rule and used M 
samples from the current estimate of the variational posterior. The second Taylor coefficient 
is straightforward to derive with similar algebra: 
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where we increase M as the model converges. Note that C is a free parameter that we 
can set without changing the final solution. We set C to the average of logp(x n _i )m |...) — 
log g n _i(x n __i jjn ) across the set of M samples. 

Quasi-Monte Carlo samples. Instead of taking iid samples from the variational distri- 



bution qM-i, we used quasi-Monte Carlo sampling Niederreiter (1992). By taking non-iid 



samples from g m _i, we are able to decrease the variance around estimates of the Taylor co- 
efficients. To select these samples, we took M equally-spaced points from the unit interval, 
passed these through the inverse CDF of the variational Gaussian q n _i(x), and used the re- 



sulting values as samples. Note that these samples produce a biased estimate of Equation [TT 
This bias decreases as N — > oo. 

When we update the variational parameter x u , we do not need to sample all random 
variables, but we do need a sample of all random variables in the Markov blanket of x u . The 
cumulative distribution is of course ill-defined for multivariate distributions, so the method in 
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the last paragraph is not quite enough. For a quasi-Monte Carlo sample from the multivariate 
distribution of s u 's Markov blanket, we selected M samples using the method in the previous 
paragraph for each marginal in the Markov blanket of x u . We then permuted each variable's 
samples and combined them for M multivariate samples {x n _i i?n , . . . , &n-i,m}m from the 
current estimate g n _i of the variational distribution. 

Estimating 9 l0 Q^ m • We estimate the gradients of logg above based on the distribution 
of the variational marginals. We have defined the variational distribution to be factorized 
Gaussians, so these take the form 



d1ogq n -i(x n -i t 
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We finally address practical details of implementing issue- adjusted ideal points. 
A. 2. Algorithmic parameters. 

We fixed the variance a 2 to exp(— 5). Allowing a x to vary freely provides a better variational 
bound at the expense of accuracy. This happens because the issue- adjusting model would 
sometimes fit poor means to some parameters when the posterior variance was large: there 
is little penalty for this when the variance is large. Low posterior variance a 2 is similar to a 
non-sparse MaP estimate. 

These updates were repeated until the exponential moving average A es ^ i 0.8A es ^ 
0.2A o ^ s . of the change in KL divergence dropped below one and the number N of samples 
passed 500. If the moving average dropped below one and iV < 500, we doubled the number 
of samples. 

When performing the second-order updates described in Section |3j we skipped variable 
updates when the estimated Hessian was not positive definite (this disappeared when sample 
sizes grew large enough). We also limited step sizes to 0.1 (another possible reason for smaller 
coefficients). 

A. 3. Hyperparameter settings 

The most obvious parameter in the issue voting model is the regularization term A. 

The main parameter in the issue- adjusted model is the regularization A, which is shared 
for all issue adjustments. The Bayesian treatment described in the Inference section of this 
paper demonstrated considerable robustness to overfitting at the expense of precision. With 
A = 0.001, for example, issue adjustments z u ^ remained on the order of single digits, while 
experiments with MaP estimates yielded adjustment estimates over 100. 

We report the effect of different A by fitting the issue-adjusted model to the 109th 
Congress (1999-2000) of the House and Senate for a range A = 0.0001, . . . , 1000 of regulariza- 
tions. We performed 6-fold cross-validation, holding out one sixth of votes in each fold, and 
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109th U.S. House sensitivity to A 
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Figure A.l: Average log-likelihood of heldout votes by regularization A. Log-likelihood 
was averaged across folds using six-fold cross validation for Congress 109 (2005-2006). The 
variational distribution represented votes with higher heldout log-likelihood than traditional 
ideal points for 1 < A < 10. In a model fit with permuted issue labels (Perm. Issue), heldout 
likelihood of votes was worse than traditional ideal points for all regularizations A. 



calculated average log-likelihood J2 Vud ev heMont ^°SP( v ud\ %ui Zji, (idj bd) for votes Vj lcldout in the 
heldout set. Following the algorithm described in Section [3j we began with M — 21 samples 
to estimate the approximate gradient (Equation 11) and scaled it by 1.2 each time the Elbo 
dropped below a threshold, until it was 500. We also fixed variance a%, cr|, of, of = exp(— 5). 
We summarize these results in Table IA.1I 

The variational implementation generalized well for the entire range, representing votes 
best in the range 1 < A < 10. Log-likelihood dropped modestly for A < 1. In the worst 
case, log-likelihood was -0.159 in the House (this corresponds with 96% heldout accuracy) 
and -0.242 in the Senate (93% heldout accuracy). 

We recommend a modest value of A = 1, and no greater than A = 10. At this value, the 
model outperforms ideal points in validation experiments on both the House and Senate, for 
a range of Congresses. 



APPENDIX B. CORPUS PREPARATION 



B.l. Issue labels 

In the empirical analysis, we used issue labels obtained from the Congressional Research 
Service. There were 5, 861 labels, ranging from World Wide Web to Age. We only used issue 
labels which were applied to at least twenty five bills in the 12 years under consideration. 
This filter resulted in seventy-four labels which correspond fairly well to political issues. 
These issues, and the number of documents each label was applied to, is given in Table |B~2~ 

B.2. Vocabulary selection 

In this section we provide further details of vocabulary selection. 
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Figure B.2: Issue labels and the number of documents with each label (as assigned by the 
Congressional Research Service) for Congresses 106 to 111 (1999 to 2010). 



Issue label 


No. 








bills 


Issue label 


No. 


Women 


25 




bills 


Military history 


25 


Europe 


44 


Civil rights 


25 


Military personnel and dependents 


44 


Government buildings; facilities; and 


26 


Taxation 


47 


property 




Government operations and politics 


47 


Terrorism 


26 


Postal facilities 


47 


Energy 


26 


Medicine 


48 


Crime and law enforcement 


27 


Transportation 


48 


Congressional sessions 


27 


Emergency management 


48 


East Asia 


28 


Sports 


52 


Appropriations 


28 


Families 


53 


Business 


29 


Medical care 


54 


Congressional reporting requirements 


30 


Athletes 


56 


Congressional oversight 


30 


Land transfers 


56 


Special weeks 


31 


Armed forces and national security 


56 


Social services 


31 


Natural resources 


58 


Health 


33 


Law 


60 


Special days 


33 


History 


61 


California 


33 


Names 


62 


Social work; volunteer service; chari- 


33 


Criminal iustice 

........... juuU.V,., 


62 


table organizations 




Communications 


65 


State and local government 


34 


Public lands 


68 


Civil liberties 


35 


Legislative rules and procedure 


69 


Government information and archives 


35 


Elementary and secondary education 


74 


Presidents 


35 


Anniversaries 


82 


Government employees 


35 


Armed forces 


83 


Executive departments 


35 


Defense policy 


92 


Racial and ethnic relations 


36 


Higher education 


103 


SiDorts anH tpcvp£\ tion 


36 


Eorpi p"n nolirv 


104 


Labor 


36 


International affairs 


105 


Special months 


39 


Budgets 


112 


Children 


40 


Education 


122 


Veterans 


40 


House of Representatives 


142 


Human rights 


41 


Commemorative events and holidays 


195 


Finance 


41 


House rules and procedure 


329 


Religion 


42 


Commemorations 


400 


Politics and government 


43 


Congressional tributes 


541 


Minorities 


44 


Congress 


693 


Public lands and natural resources 


44 
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We first converted the words in each bill to a canonical form using the Tree-tagger part- 
of-speech tagger (Schmid 1994). Next we counted all phrases with one to five words. From 
these, we immediately eliminated phrases which occurred in more than 10% of bills or in 
fewer than 4 bills, or which occurred as fewer than 0.001% of all phrases. This resulted in a 
list of 40603 phrases (called n-grams in natural language processing). 

We then used a set of features characterizing each word to classify whether it was good 
or bad to use in the vocabulary. Some of these features were based on corpus statistics, such 
as the number of bills in which a word appeared. Other features used external data sources, 
including whether, and how frequently, a word appeared as link text in a Wikipedia article. 
We estimated weights for these features using a logistic regression classifier. To train this 
classifier, we used a manually curated list of 458 "bad" phrases which were semantically awk- 
ward or meaningless (such as the follow bill, and sec amend, to a study, and pr). These were 
selected as as negative examples in a /^-penalized logistic regression, while the remaining 



words we considered "good" words. We illustrate weights for these features in Figure |B.3 
The best 5,000 phrases under this model were used in the vocabulary. 
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Coefficient Summary Weight 

log(count + 1) Frequency of phrase in corpus -0.018 

log(number.docs + 1) Number of bills containing phrase 0.793 



anchortext. present 1 RUE 


Occurs as anchortext in Wikipedia 


1.730 


anchortext 


Frequency of appearing as anchortext in 


1.752 




Wikipedia 




• 

frequency, sum . di v . number .docs 


Frequency divided by number of bills 


n nnv 
-U.UU ( 


doc.sq 


Number of bills containing phrase, 


n on A 

-0.294 




squared 




nas.sec i rtunj 


Contains the phrase sec 


-U.4oy 


nas.par i rturj 


Contains the phrase paragra 


-U.O 1 


nas.striK i ku Hj 


Contains the phrase strik 


n oqv 
-U.vo I 


HdS . dilieilCl J- XV U Hj 


V^UIlLdlllo llie pill doc iLllicllU 




has.insTRUE 


Contains the phrase insert 


-0.727 


has.clauseTRUE 


Contains the phrase clause 


-0.268 


has.provisionTRUE 


Contains the phrase provision 


-0.432 


has.titleTRUE 


Contains the phrase title 


-0.841 


test.pos 


ln(maa;(— test, 0) + 1) 


0.091 


test.zeroTRUE 


1 if test = 


-1.623 


test.neg 


]n(max (test, 0) + 1) 


0.060 


number. termsl 


Number of terms in phrase is 1 


-1.623 


number. terms2 


Number of terms in phrase is 2 


2.241 


number. terms3 


Number of terms in phrase is 3 


0.315 


number. terms4 


Number of terms in phrase is 4 


-0.478 


number. terms5 


Number of terms in phrase is 5 


-0.454 


log(number.docs + 1) * anchortext 


ln(Number of bills containing phrase) 


-0.118 



{Appears in Wikipedia anchortext} 
log(count + 1) * log(number.docs + ln(Number of bills containing phrase + 1) 0.246 
1) x ln(Frequency of phrase in corpus + 1) 



Figure B.3: Features and coefficients used for predicting "good" phrases. Below, test is a test 
statistic which measures deviation from a model assuming that words appear independently; 

large values indicate that they occur more often than expected by chance. We define it as 

^ ^ _ Observed count-Expected count 

-y/Expected count under a language model assuming independence' 
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